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Abstract — We study a new class of codes for Gaussian multi- 
terminal source and channel coding. These codes are designed 
using the statistical framework of high-dimensional linear regres- 
sion and are called Sparse Superposition or Sparse Regression 
codes. Codewords are linear combinations of subsets of columns 
of a design matrix. These codes were introduced by Barron and 
Joseph and shown to achieve the channel capacity of AWGN 
channels with computationally feasible decoding. They have 
also recently been shown to achieve the optimal rate-distortion 
function for Gaussian sources. In this paper, we demonstrate how 
to implement random binning and superposition coding using 
sparse regression codes. In particular, with minimum-distance 
encoding/decoding it is shown that sparse regression codes attain 
the optimal information-theoretic limits for a variety of multi- 
terminal source and channel coding problems. 

I. Introduction 

Among the important outstanding problems in network 
information theory is developing codes for various multi- 
terminal source and channel models that are provably rate- 
optimal with computationally efficient encoding and decoding 
algortihms. The introduction of deep ideas such as superpo- 
sition |1|, random binning (2) and auxiliary random variables 
(3)-(5) has led to a sharp characterization of information- 
theoretic limits for several network problems. However, until 
recently, even the best feasible codes for these problems fell 
short of these limits. 

There have been some recent breakthroughs that begin 
to bridge this gap. Polar codes were the first codes with 
computationally feasible encoding algorithms that were shown 
to provably attain the information-theoretic limit for discrete- 
alphabet symmetric sources and channels [6|-|9|. Spatially 
coupled ensembles have recently been shown to achieve the 
capacity of binary-input symmetric-output channels with belief 
propagation decoding (10) . There are many important com- 
munication settings where the source or channel alphabet is 
inherently continuous, notably Gaussian sources and AWGN 
channels. Elegant techniques such as lattice coding have been 
proposed for continuous-alphabet source and channel coding 
|JTTJ — [ 1 3 1 , but these rate-optimal coding schemes do not have 
feasible encoding and decoding algorithms. 

Recently a class of codes called Sparse Superposition Codes 
or Sparse Regression Codes (SPARC) was introduced by 
Barron and Joseph fl4)-fT6) for communication over the 
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Fig. 1: Source and Channel Coding with Side-Information 



AWGN channel. In (16), it was shown that SPARCs achieve 
the AWGN channel capacity with a computationally feasible 
decoding algorithm. SPARCs have also been shown to attain 
the optimal rate-distortion function of Gaussian sources with 
feasible algorithms (T7[-(T9). In this paper, we show that the 
sparse regression framework can be used to design feasible 
codes for various Gaussian multi-terminal source and channel 
models. 

The basic ingredients of the constructions used to prove 
coding theorems for many multi-terminal problems are: 

1) Rate-optimal point-to-point source and channel codes, 

2) Random binning, 

3) Superposition coding. 

As mentioned above, it has been shown in [14-], (16), (TSj that 
SPARCs are rate-optimal for Gaussian channels and sources. 
In this paper, we show that source and channel coding SPARCs 
can be combined to implement binning and superposition, 
thus yielding a new class of codes for multi-terminal source 
and channel coding. To illustrate how SPARCs can be used 
for binning, we consider the canonical examples of source 
coding with decoder side-information (the Wyner-Ziv prob- 
lem [4 1) and channel coding with encoder side-information 
(the Gelfand-Pinsker problem (3), (20)). These problems are 
depicted in Figure [T] Superposition coding using SPARCs is 
a natural extension of point-to-point coding and is illustrated 
via the Gaussian multiple-access and broadcast channels. 



In Sections [TT] and |TTTJ we review the SPARC construction 
and the minimum-distance performance results for source and 
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channel coding. In Section IV we describe how to imple- 
ment random binning using SPARCs and use it to construct 
codes for the Wyner-Ziv problem (Figure [Ta]i. The standard 
construction for this problem consists of a high-rate source 
codebook partitioned into bins, each of which serves as a 
lower-rate channel code. Due to the importance of the problem, 
several practical code-constructions have been proposed, e.g., 
[21 |-p4|, but they generally fall short of the Wyner-Ziv 
bounds; besides they do not come with provable performance 
guarantees. Recently polar codes have been proposed for 
Wyner-Ziv coding (7), (8). These are the first computationally 
efficient code constructions that are provably rate-optimal. 
However, these are only applicable to problems where the 
source and side-information distributions are discrete and 
symmetric. Elegant coding schemes such as those based on 
lattices have been proposed (TTJ, (25) for the Wyner-Ziv 
problem with continuous-valued source and side-information, 
but they have exponential encoding and decoding complexity. 

In Section [V] we turn our attention to channels with state, 
where the state information is known non-causally at the 



transmitter. This model (Figure lb i has been studied widely in 
the literature (5), (TTJ, (20) , (26 1 and has found many practical 
applications such as multi-antenna communication [27], digital 
watermarking [28 1, [29] and steganography [30 1 . It is the 
channel coding dual of the Wyner-Ziv problem ]31), p2[. In 



Figure lb the encoder knows the entire state sequence S at 



the beginning of communication while the decoder observes 
only the channel output Y. This capacity of this channel model 
was determined by Gelfand and Pinsker [5 ]. For the important 
special case of AWGN channels with Gaussian state, Costa 
[20 1 showed that the Gelfand-Pinsker capacity is the same as 
the rate achievable when the decoder has full knowledge of 
S. Since Costa's discovery of this surprising result (dubbed 
'writing on dirty paper'), elegant capacity-achieving coding 
schemes have been developed such as nested lattice codes (TT) , 
[26 1, [29 1, but these are generally computationally infeasible. 
Several computationally efficient code designs have also been 
proposed, e.g., |33|, (34); however they do not come with 



provable rate guarantees. In Section [V] we show how to 
implement Costa's coding scheme by partitioning a high-rate 
SPARC channel code into bins of lower-rate source codes. 



Finally in Section VI we show how to construct capacity- 
achieving codes for the AWGN multiple-access and broadcast 
channels using SPARCs. We show that superposition codes for 
these channels can be implemented through a simple extension 
of SPARCs for point-to-point channel coding. 

The analysis of SPARCs in this paper is presented with 
minimum-distance encoding and decoding, which is optimal 
but computationally inefficient. This is mainly to keep exposi- 
tion simple and to highlight the main contribution of the paper 
- a demonstration that binning and superposition can be easily 
implemented with sparse regression ensembles described by 
compact dictionaries. The results also hold with the feasible 
SPARC encoders and decoders developed in (16), (17), (19) . 
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Fig. 2: A is an n x ML matrix and /3 is a ML x 1 binary vector. 
The positions of the non-zeros in /3 correspond to the gray columns 
of A which add to form the codeword A/3. 



Further, we focus only on the achievability of the optimal 
information-theoretic rates and do not discuss the SPARC 
error exponents obtained in (14) , (T8) . These aspects will be 
discussed in an extended version of this paper. 

Notation: Upper-case letters are used to denote random vari- 
ables, lower-case for their realizations, and bold-face letters to 
denote random vectors and matrices. All vectors have length 
n. ||X|| denotes the ^-norm of vector X, and |X| = ||X||/i/n 
is the normalized version. We use natural logarithms, so 
entropy is measured in nats. To limit the number of symbols 
introduced, we reuse notation across sections. For example, X 
is used to represent the channel input as well as the source; 
Y is used to denote both the channel output and the source 
side-information. The model description at the beginning of 
each section explains all the variables used in it. 

II. Sparse Regression Codes 

A sparse regression codebook (SPARC) is defined in terms 
of a design matrix A of dimension nx ML whose entries are 
i.i.d. Af(0, 1), i.e., independent zero-mean Gaussian random 
variables with unit variance. Here n is the block length and 
M and L are integers whose values will be specified shortly 
in terms of n and the rate R. As shown in Figure [2] one 
can think of the matrix A as composed of L sections with 
M columns each. Each codeword is a linear combination of 
L columns, with one column from each section. Formally, a 
codeword can be expressed as A/3, where /3 is an ML x 
1 vector (/3i, . . . , [3ml) with the following property: there is 
exactly one non-zero j3i for 1 < i < M, one non-zero /3; for 
M + 1 < i < 2M, and so forth. Denote the set of all /3's 
that satisfy this property by Bm,l- The non-zero values of f3 
are all set equal to c = -3= where 7 will be specified later 

V L 

depending on the problem at hand. 

Since there are M columns in each of the L sections, the 
total number of codewords is M L . To obtain a rate of R 
nats/sample, we therefore need 
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(1) 



There are several choices for the pair (M, L) which satisfy 
this. For example, L = 1 and M = e nR recovers the 
Shannon-style random codebook; here the number of columns 



in the dictionary A is e nR , i.e., exponential in n. For our 
constructions, we choose M — L b for some b > 1 so that ([T| 
implies 

LlogL = nR/b. (2) 



Thus L is now 9 



and the number of columns ML 

6+1 



log n 

in the dictionary A is now (^j^^J . a polynomial in n 
This reduction in dictionary complexity can be harnessed to 
develop computationally efficient encoders and decoders for 
the sparse regression code. 

Since each codeword in a SPARC is a linear combination of 
L columns of A (one from each section), codewords sharing 
one or more common columns in the sum will be dependent. 
Also, SPARCs are not linear codes since the sum of two 
codewords does not equal another codeword in general. 

iii. sparc for point-to-point source and channel 
Coding 

In this section, we review the performance of SPARCs 
for point-to-point source and channel coding under minimum 
distance encoding/decoding. 

A. Lossy Source Coding 

Consider an i.i.d Gaussian source X with mean and 
variance a 2 . A rate-distortion codebook with rate R and 
block length n is a set of e nR length-n codewords, denoted 
{X(l), . . . , X(e"" R )}. The quality of reconstruction is mea- 
sured through the mean-squared distortion criterion 



d„(X,X) = |X 
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where X is the codeword chosen to represent the source 
sequence X. For this distortion criterion, an optimal encoder 
maps each source sequence to the codeword nearest to it in 
Euclidean distance. The rate-distortion function R*(D), the 
minimum rate for which the distortion can be bounded by D 
with high-probability, is given by [ 35 1 



R*{D) 



, - x 1 a 2 
min I(X; X) = — log — nats/sample. 

Pjt]x -.E(x-xr<D 2 D 

(3) 

For rates R > R*(D), a sparse regression codebook is 
defined in terms of an nxML design matrix A/3, as described 
in the previous section. The non-zero values of /3 € Bm,l are 
all set equal to J (a 2 — D)/L. Encoding and decoding are as 
follows. 

Minimum-distance Encoder. This is defined by a mapping 
g : K™ — > Bm.l- Given the source sequence X, the encoder 
determines the /3 that produces the codeword closest in Eu- 
clidean distance, i.e., 



S(X) 



argmin 

PeB M ,L 
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Decoder: This is a mapping h : Bm l ^■ n - On receiving 
/3 G Bm.l from the encoder, the decoder produces reconstruc- 
tion h(J)) = A/3. 



The probability of error at distortion-level D of a rate- 
distortion code C n with block length n and encoder and 
decoder mappings g,h is 



P e (C„, J D) = P(|X-%(X))| 2 > J D) 



(4) 



It was shown in [18| that SPARCs can achieve the optimal 
rate-distortion function with the optimal error-exponents for 
i.i.d Gaussian sources for all distortions D such that D/a 2 < 
x* , where x* w 0.2032 is the solution of the equation 

1 + = x. (5) 

Fact 1: [18] For D E (0,a 2 ), let R sp (D) = 
max{i log "g-, 1 - ^}. Fix rate R > R sp (D), e > and 
b > bi where 



h = 



2.5R 



R-l + D/a 2 



(6) 



For all n, let C n be a rate R SPARC defined by an n x L n M n 
design matrix with i.i.d Af(0, 1) entries, where L n is deter- 
mined by |2]i and M n = L b n . Then for all sufficiently large n, 
P e {C n ,D)<e. 

Fact [T] implies that SPARCs achieve the optimal rate- 
distortion function for < ^ < x* where x* w 0.2032 is the 
solution of J5). For x* < ^ < 1, the minimum achievable 
rate of Fact lf(l — is larger than the optimal rate-distortion 
function. 

B. Communication over an AWGN Channel 

Consider an AWGN channel with input X and output Y 
defined by 

Y = X + Z 

where Z ~ A/"(0, N) is a noise variable independent of X. 
There is an average power constraint P on the input X. Denote 



by v the signal-to-noise ratio P/N. It was shown in |14| 
that SPARCs can achieve the capacity \ log(l + v) with the 
probability of error decaying exponentially with n. 

Encoder: This is a mapping g : Bm,l —> K Tl - Each message 
in the set {1, . . . , M L = e nR } is indexed by a unique /3 g 
Bm,l- The non-zero values of (3 are all equal to \JPjL. To 
transmit the message corresponding to /3, the encoder produces 
the channel input X = A/3. 

Minimum-distance Decoder: This is defined by a mapping 
h : K™ — > Bm,l- Upon receiving the output sequence Y, the 
encoder determines the /3 that produces the codeword closest 
in Euclidean distance, i.e., 

$ = h(Y) = argmin ||Y - A/3|| 2 . 

PeB M , L 

The average probability of error of a code C n with block 
length n and encoder and decoder mappings g, h is 



(7) 



pee,, 



The performance of SPARC for channel coding is given 
below. 



Let v* k, 15.8 be the solution to (l+u*)log(l+u*) = 'iv* 
Define 



bo(v) 



4v(l+v) log(l+u) 
[(H-u)log(l+?;)-i>] 2 



(l+ti)log(l+ti) 



v < v 



V > V* 



(8) 



(l+v) log(l+«)-2o 

We note that b asymptotically approaches 1 with growing v. 



Fact 2: 1 14 Fix rate R < C = \ log(l + v), b > b (v) 



and e > 0. For all n, let C„ be a rate i? SPARC defined by an 
n x L n M n design matrix with i.i.d A/"(0, 1) entries, where Zi n 
is determined by |2]i and Af„ = L b n . Then for all sufficiently 
large n, P e (C n ) < e. 

IV. SPARC for Lossy Compression with Decoder 
Side-Information 

In this section, we construct SPARCs to achieve the opti- 
mal Wyner-Ziv rate for Gaussian sources. Consider an i.i.d 
Gaussian source X ~ Af(0, c 2 ) to be compressed with mean- 
squared distortion D. The decoder side-information Y is noisy 
version of X and is related to X by Y = X + Z, where 
Z ~ Af(0, N) is independent of X, The sequence Y is 
available at the decoder non-causally. If Y were available 
at the encoder as well, the optimal strategy is to compress 
Z = Y — X to within distortion D; the minimum rate required 



for this is I log 



Var(X\Y) 
D 



nats/sample. Wyner and Ziv showed 



in J4J that this rate is achievable even when Y is available at 
only the decoder. 

Before presenting the SPARC construction, we briefly re- 
view the main ideas in the Wyner-Ziv random coding scheme 
[4|. Define an auxiliary random variable [/jointly distributed 
with X according to 



u = x + v 



(9) 



where V ~ A/"(0, Q) is independent of X. The idea is that 
the decoder first recovers U, and then produces X as the 
best estimate of X given U and Y. The codebook consists 
of length-rt vectors chosen i.i.d according to the marginal 
distribution of U. The encoder attempts to find a codeword 
U whose empirical joint distribution with X is close to 
From the rate-distortion theorem, this step will be successful 
if the codebook size is at least slightly larger than e nI ^ u ' X \ 
Since the decoder has Y, the index of the chosen codeword 
U is not sent in its entirety; instead we divide the codebook 
into e nR equal-sized bins and send only the index of the bin 
containing the codeword. Thus the information rate to the 
decoder is R nats/source sample which is less than the rate 
I(U; X) required to convey the precise codeword index. 

The decoder's task is to recover the codeword U using the 
bin index and the side-information Y. This is equivalent to 
a channel decoding problem. We can correctly distinguish U 
from the other codewords in the bin if number of codewords 
in each bin is exponentially less than e nI ( U ' Y \ Combining 
this with the minimum codebook size for quantization, we see 



that the number of bins e nR should satisfy 

nI(U;X) 

e nR > 



e nI(U;Y) 



or 



R > I(U;X) - I(U;Y) 



1 , Vai(X\Y) 
- log — - 

2 s D 



where the last inequality is obtained by setting Q = 
\k(x\y)-d • After decoding U, the decoder reconstructs X 
as the MMSE estimate of X given (U, Y). It can be verified 
that expected squared-error distortion is D. 

We now show that the above coding scheme with binning 
can be implemented with SPARCs. The relation Q can be 
equivalently written in terms of the reverse test channel as 



where a = 



X = aC/4 
and V - Af(0, 



V 

u 2 Q 



(10) 

is independent of U. 



The first step of the coding scheme is equivalent to quantizing 
the source sequence X to a codeword aU with mean-squared 
distortion at most ^rpg- We can use a SPARC to perform 
this quantization by choosing a design matrix with parameters 
satisfying the specifications in Fact [T] 

Instead of sending the codeword index (3 to the decoder in 
its entirety, we divide each section of the design matrix A into 
subsections of M' columns each as shown in Figure [3] and 
only send information to indicate which subsection in each of 
the L sections of (3 contains a non-zero. More precisely, we 
send the decoder a tuple (pi, . . . ,pl) where pi £ {1, . . . , jp} 
indicates a subsection in the zth section of A. This strategy is 
equivalent to binning: a bin is now a subset of the codebook 
consisting of codewords corresponding to (3's with ones in 
the sections specified by (pi, . .. ,Pl)- The codebook is thus 
divided into (jp) bins and the rate R required to send the 
bin index to the decoder is determined as 



nR 



= (M/M'y 



(11) 



We note that each bin is itself a smaller sparse superposition 
codebook with M' L codewords, defined by a n x M'L sub- 
matrix of A. 

The decoder side-information variable Y is related to U as 



Y = X + Z = aU + V' + Z 



(12) 



where U, V and Z are mutually independent. The problem 
of recovering U from Y at the decoder is a channel decoding 
problem over a channel with signal-to-noise ratio given by 

a 2 Var(C7) 



Sfir Var(F')+Var(Z) <r 2 Q + (a 2 + Q)N' (13) 

Since Fact [2] shows shown that SPARCs can achieve the 
AWGN channel capacity, the decoder can perfectly recover 
U if the number of codewords in each bin satisfies 



1 



< exp In- log(l + snr; 

The above SPARC coding scheme and its performance are 
formalized below. 
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Fig. 3: Each section is divided into subsections of M' columns. A bin is formed by specifying a subsection in each of the L sections as 
shown by the shaded regions. 



Definition 1 (Nested Sparse Regression Codebook): A 
nested sparse regression codebook with rates (Ri,R 2 ) an d 
block length n is defined by an n x ML design matrix A 
with i.i.d Af(0, 1) entries, where M L = e nRl . Each section of 
M columns is divided into sub-sections of M' columns each, 



where M' is determined by M = e r ' 2 . The codebook 
consists of codewords A/3, where j3 £ Bm,l contains one 
non-zero element in each of the L sections. 

Theorem 1: Fix R\ > max{| log a q , a 2 + Q } and R 2 < 
| log(l + snr) where 

Vk{X\Y)D 
^ Var(X|F) - D 

and snr is given by ( fT3| ). Then for any e > and all sufficiently 
large n, there exists a rate Ri — R 2 code C n with P e (C n , D) < 
e where C n is defined by a nested sparse regression codebook 
with rates (R±, R 2 ) whose n x ML design matrix satisfies the 
following: M = L b where 

2.5.Ri 



b > 



Ri-a 2 /(a 2 + Q)' R 



-6 (snr) 



and L is determined by bL log L = nR\. (The function bo(.) 
is defined in ([H}). 
Define 

D* 



l + x*a 2 /N 

where x* w 0.2032 is the solution of ((5J. 

Corollary 1: For Z? G (0, £)*), sparse regression codes 
achieve the optimal Wyner-Ziv rate-distortion function for 



Gaussian sources given by \ log — n 



Var(X|y) 
D 

Proof: It can be verified that for D € (0,D*), the lower 
bound on R\ specified by the theorem becomes \ log " . 



The corollary then follows by choosing R\ 

e. This yields 

2e where e > can be arbitrarily 



Q ' " 

and i?2 = I log(l + snr) — e. This yields an achievable rate 



Var(X|y) 
15 



i?i - R 2 = | log 
small. ■ 
Proof of Theorem [7] 

Fix block length n and rates Rx,R 2 . Choose a n x ML 
design matrix A with M = L b and bL log L — nRi where b is 



greater than the minimum value specified by the theorem. Each 
section of A is partitioned into sub-sections of M' columns 
each, where M' L = e nR2 . 

The U -codebook consists of all vectors A/3 such th at /3 £ 

Bm,l and the non-zero entries in /3 are all equal to ' 
Let' 



(T 2 + 

Encoder. Given source sequence X, find the codeword U 
from the SPARC such that aU is closest to X in Euclidean 
distance. Specifically, determine 

(3* = argmin ||X- aA/3|| 2 . 

Send the decoder a tuple (pi, . • • , where pi g {1, . . . , M?} 
indicates the subsection in the zth section of A where /3* 
contains a non-zero element. The rate to the decoder is 

L 



1 



■log 



M 
~M> 



— Ri — i?2 



Decoder: The decoder first determines the n x M'L sub- 
matrix corresponding to the subsections of A indicated by 
Pi,...,Pl- We denote this sub-matrix Amu- A-bin defines a 
SPARC Ab in f3 where /3 £ Bm'.l contains one non-zero value 
equal to \/ (a 2 + Q)/L in each of its L sections. The decoder 
now reconstructs U = A(,j„/3 where 

/3 = argmin || Y - aA bm /3|| 2 . (14) 
Finally, the source sequence is reconstructed as 



1 1 




-'/ft. 


Y 


Q + <T 2 


4) 







£rrar Analysis: Let 5 > be such that 

fl (a 2 + Q)(l + J) 

fti>max<-log — , 1 — — 

1 2 Q (ct 2 + Q)(1 + 5) 



(16) 



The probability of the error event £, can be decomposed as 
P(£) = P(£i U £ 2 U £ 3 ) where fi is the event that |X| 2 > 



<t 2 (1 + S), £ 2 is the event of error at the encoder, and £ 3 the The channel receives the output sequence Y according to 



event of error at the decoder. We have 

P(£ 1 )=P{\X\ 2 >a 2 (l + S)) < 



(17) 



for sufficiently large n from standard results on large- 
deviations [36 1 . Next, we have 



P{£ 2 | ei) — P ( min |X - aAf3\ 2 > 



a 2 Q 



< 



0eB M ,L a 2 + Q J " 3 

(18) 

for sufficiently large n. This follows from Fact [T] since R\ 
and b satisfy the conditions specified in Fact[T|for compressing 
source sequences of variance up to cr 2 (l+S) at distortion-level 
^t+q ■ Finally, we bound 



p(£ 2 \£Z,£i 



= P I argmm 



\Y - aA bm f3\\ 2 ^ P* 



Let the number of columns in each sub-section M' 



Using M 



iL 



2 we have 

nR 2 



Ri 

b— > 6 (snr) 



(19) 



L log L 

where the last inequality is due to the minimum value of b 
specified by the theorem. Since 

1 



R 2 < ^ Iog(l 



snr) 



and V > 6o(snr), the n x M'L design matrix A.bin satisfies 
the conditions of Fact [2] for signal-to-noise ratio given by 
( fT3| >. Hence for sufficiently large n, P(£ 2 | £\,£ 2 ) < e/3. 
Combining this with ( fT7] i and ( fTS) , we have P{£ ) < e. ■ 

V. SPARC FOR WRITING ON DlRTY PAPER 

The AWGN channel with state is defined by the relation 
Y = X + S + Z, where the state S ~ Af(0, a 2 ) is independent 
of the additive noise Z ~ Af(0, N). There is an average power 
constraint P on the input sequence X. The state sequence S ~ 
i.i.d A/"(0,ct 2 ) is known non-causally at the encoder. 

We first review the main ideas behind Costa's capacity- 
achieving coding scheme J20) for this channel. The state 
sequence S (known only at the encoder) is used in two ways: 
part of it is used for coding and the rest is treated as noise. 
Define an auxiliary random variable U as 



U = X 



(20) 



where X ~ (0, P) is independent of S and a e (0, 1) is 
a constant specified later. The channel codebook consists of 
e nRl [/-sequences chosen i.i.d A/"(0, P+a 2 a 2 ). We divide this 
codebook into e nR equal-sized bins with each bin representing 



a message. To transmit message m £ {1. 



}, the 



encoder observes the state sequence S and attempts to find a 
codeword U within bin m whose empirical joint distribution 
with S is close to ( |20) , From rate-distortion theory, this step 
will be successful if the number of sequences in each bin 
e n{R 1 -R) j s i ar g er t nan e ni(U;S) _ -p ne encoc [er then forms the 

channel input sequence X as U — aS. 



Y = X + S + Z = U+{l-a)S + Z 



(21) 



and attempts to decode U. This is effectively an AWGN 
channel decoding operation, which will be successful if Ri < 
I(U;Y). Combining this with the lower bound Ri — R > 
I(U; S), we see that any rate R < I(U; Y) - I(U; S) is 
achievable. The right-side of the inequality is equal to the 
channel capacity ^ log(l + P/N) for the joint distribution 
given by ( |20t > and ( |2T| for a = P R N - 

We now show how to implement the above coding scheme 
with a nested SPARC. Define a nested SPARC with rates 
(i?i ,Ri-R) through an n x ML matrix A with M L = e nRl . 
As in Section |IV] each bin corresponds to a SPARC defined by 
a sub-matrix of A, consisting of L subsections of M' columns. 
We note that M' L — e n ( R i- R ) which implies that the number 



of bins is (M/M 



„nR 



Thus each message indexes a 



unique bin of the nested SPARC or equivalently, a unique 
sub-matrix of A . 

The relation ( f20] > can be be equivalently written in terms of 
the reverse test channel as 



S = kU + X' 



(22) 



where k = aa 2 /{P + a 2 a 2 s ) and X' - Af(0, P +°i^ ) is 
independent of U. Given the message and state sequence 
S, the encoder needs to quantize the state sequence S to a 
codeword kU (within the bin indexed by the message) with 
mean-squared distortion at most Vai(X'). The SPARC defined 
by the message bin can reliably perform this quantization if 
the corresponding sub-matrix of A has parameters satisfying 
the specifications in Fact[T] Using p2| , the channel law pT| ) 
can be written as 

Y = U+(l-a)S+Z = U+(l-a)nU+(l-a)X' + Z (23) 

where U,X',Z are mutually independent. Thus the decoder 
has to recover the codeword U transmitted over a channel with 
effective signal-to-noise ratio 



snr = 



(1 + (1 - a)K) 2 Var(U) _ (1 + (1 - a)n) 2 (P + a 2 a 2 e 
(1 - a) 2 Var(X') + TV 



(l-a) 2 Pcr 2 



+ N 



(24) 

For all Ri < \ log(l +snr), this step is successful with high 
probability if the design matrix A satisfies the specifications 
in Fact [2] 

The performance of this coding scheme is formalized in the 
following theorem. 

2 

Theorem 2: Fix a € (0, 1). Let n = p "^1^2 and snr 
be defined by d24|. Fix i?i < ilog(l + snr) and R 2 > 

2 2 \ 2 2 

1 + ^-pr^ J , pi °2 a 2 } such that i?i > R 2 . There 
exists a rate R\ — R 2 code C n with P e (C n ) < e where C n 
is defined by a nested sparse regression codebook with rates 
(Ri,R 2 ) whose n x ML design matrix satisfies the following: 
M = L b where 

b > max <! — s-^r, b a snr) 

R 2 — a z crf/{P + ol a o%) 



and L is determined by bL log L = nR\. (The function & o (0 
is defined in (|8}). 

Corollary 2: Let x* w 0.2032 be the solution of the 
equation (|5). For rprpyp > — 1, sparse regression codes 
achieve the channel capacity | log (l + 

Proof: It can be verified that when we choose a = 
Pj (P + N) and P, N, a 2 satisfy the above condition, the 
lower bound on R 2 specified by the theorem becomes 

(2 2 \ 
1 + ) . The corollary then follows by choosing 

R\ = \ log(l + snr) - e and R 2 = \ log - 
This yields an achievable rate R\—R 2 = \ log(l 



p 

P/N)-2e 



where e > can be arbitrarily small. ■ 
Proof of Theorem [2] 

Fix block length n and rates Ri,R 2 . Choose a n x ML 
design matrix A with M — L b and b greater than the minimum 
value specified by the theorem. The J7-codebook consists of 
all vectors A/3 such that j3 £ Bm,l and the non-zero entries 
in (3 are all equal to W{P + a 2 a 2 )/L. 



We have M L 



, and each section of A is partitioned 



into sub-sections of M' columns each where M' L — e nR2 . 
Each of the e n ( R i- R 2) messages corresponds to a unique tuple 
(Pi, ■ ■ ■ ,Pl) where Pl e{l,..., j^}. 

Encoder: The message (pi, • . . indexes an n x M'L 
sub-matrix of A. This sub-matrix denoted by Aun- Find the 
codeword U from the SPARC Abi n such that kTJ is closest 
to S in Euclidean distance. Specifically, determine 



argmm 



and transmit 



X = A bin f3* - aS. 



Decoder: Given channel output Y, find the codeword U 
from the SPARC such that (1 + (1 - a)n)U is closest to Y 
in Euclidean distance. Specifically, determine 

j3 = argmin ||Y - (1 + (1 - a)n)AP\\ 2 . 

Decode the message as (pi, . . . ,pl) where G {1, . . . , jjj} 
indicates the subsection in the ith section of A where $ 
contains a non-zero element. 

Error Analysis: Let S > be such that 



/,', > max <j - log Ul + S) i — ^ 



1 



(25) 



The probability of error can be decomposed as P(£) — 
P(£i U £ 2 U £ 3 ) where £ x is the event that |S| 2 > a 2 (l + S), 
£2 is the event of error at the encoder, and £3 the event of 
error at the decoder. We have 



F(£: 1 ) = P(|S| 2 >a s 2 (l + ( 5))<- 

for sufficiently large n. Letting M' = L b , we have 
nR 2 ,R 2 2.5R 2 



b R l > _ 
L log L i?i i? 2 



2 a 2 /(P + a 2 a 2 s 



(26) 



(27) 



where the last inequality is due to the minimum value of b 
specified by the theorem. We then have 

P(£. I £t) = P (^min JS - .A blnP \ 2 > ^-,) < l 

(28) 

for sufficiently large n. This follows from Fact [T] since R 2 and 
b' satisfy the conditions specified in Fact [T] for compressing 
sequences S of variance up to er 2 (l + 5) at distortion-level 
P +aS a 2 ■ Finally, P{£ 2 \ £{,£ 2 ) is given by 



P argmin ||Y - (1 + (1 - a) K )A/3|| 2 ^ 0* < e/3 (29) 

from Fact |5] since i?i < \ log(l + snr) and b > &o(snr) for 
snr given by |24]). Combining ((26]), (|28]l and ((29}, we conclude 
that P{£) < e. ■ 

VI. SPARC for Gaussian Multiuser Channels 

A. The AWGN Multiple-Access Channel 

In a multiple-access channel (MAC), several users simul- 
taneously transmit to a single receiver. For simplicity let us 
consider the case of two users, each with average power 
constraint P, transmitting information at rates Ri and R 2 , 
respectively. The receiver of the AWGN MAC observes the 
output 

Y = Xi + X 2 + Z 

where Xi , X 2 denote the channel inputs of the two transmit- 
ters and Z <~ Af(0, N) is the channel noise independent of X\ 
and X 2 . The capacity region for this channel is well-known 



1 35 1 and is given by 
1 



P 



Ri < - log(l + N 



R2 < 2 log(l 



N' 



1 2P 

Ri + R 2 < -log(l + — ). 



(30) 



We now show how to achieve the corner points of this rate 
region using SPARCs. The remaining rate points in the region 
can be achieved through time-sharing. The key observation is 
that the corner points can be achieved using a pair of point- 
to-point channel codes (37). 
Consider a rate pair 



Ri<\ iog(i 



p 



P + N- 



R 2 < - log(l 



Fix codeword length n and choose a rate R\ SPARC for 

transmitter 1 using an n x M\L\ design matrix Ai, with 

M ii = 2 n_R! and A 

1 satisfying the specifications of Fact|2]for 
signal-to-noise ratio P ^_ N - Similarly, chose a rate R 2 SPARC 
for transmitter 2 using an n x M 2 L 2 design matrix A2, with 
M 2 2 = 2 nR2 and A2 satisfying the specifications of Fact [2] 
for signal-to-noise ratio The codewords X\ = Ai/?i and 
X 2 = A2 f3 2 chosen by the respective users, are transmitted 
through the channel. The non-zero values in both 0\ and f3 2 
are set to y/P/L. The receiver obtains 

Y = Aift + A 2 [3 2 + Z 



and uses a successive cancellation strategy. It first decodes the 
message of transmitter 1 as 0±, effectively treating A 2 /3 2 +Z as 
noise. This step will be successful with high probability since 
the signal-to-noise ratio is P/(P + N) and R\ < | log(l + 
pxjy )- m me second step, the receiver decodes /3 2 from the 
residue Y — A/?i which equals A 2 /3 2 + Z if the first step was 
successful, i.e., /3i = P\, The second step will be successful 
with high probability since R 2 < |log(l + f ). The other 
corner point of the rate region can be achieved by exchanging 
the roles of X\ and X 2 . 

B. The AWGN Broadcast Channel 

Consider the two-receiver scalar Gaussian broadcast channel 
where the outputs of the two receivers are related to the 
channel input X as 

Y 1 =X + Z 1 , Y 2 = X + Z 2 . 

X has average power constraint P and the channel noises 
, Z 2 are independent zero mean Gaussian random variables 
with variances Ni and N 2 , respectively. Without loss of 
generality, we assume that N 2 > N\. The capacity region 
for this channel is given by [35) 

1, / aP\ 1, / (l-a)P\ 

Ri < - log 1 + — , R 2 < ~ log 1 + - '— . 

1 2 8 I NiJ ' 2 & \ aP + N 2 J 

(31) 

This capacity region is achievable through a SPARC coding 
scheme similar to the one for the AWGN MAC. Consider a 
rate pair (Ri, R 2 ) satisfying (31) . Fix codeword length n and 
choose a rate i?i SPARC for transmitter 1 using an n x M\L\ 
design matrix A x , with My = 2 nRl and Ai satisfying the 
specifications of Fact 2 for signal-to-noise ratio j^-. Similarly, 
chose a rate R 2 SPARC for transmitter 2 using an n x M 2 L 2 
design matrix A2, with M 2 2 = 2 nR2 and A2 satisfying the 
specifications of Fact j^j for signal-to-noise ratio . The 

input sequence is generated as X = Ai/?i + A 2 /3 2 where 
/3i,/3 2 represent the messages of the two users. The non-zero 
values in f3i and /3 2 are set to y/aP/L and ^/(l — a)P/L, 
respectively. The receivers obtain 

Yi = A x /3i + A 2 /3 2 + Zi, Y 2 = AiA + A 2/ 3 2 + Z 2 . 

Receiver 2 decodes (3 2 treating A 1 (3 1 + Z 2 as noise. 
This will be successful with high probability since R 2 < 

\ log 2 + LV+aC ) • R ecerver 1 can a l so decode /?2 with 
high probability since its since < N 2 . Hence the residue 
Yi — A 2 (3 2 at receiver 1 will be equal to Ai/3i + Zi with 
high probability. Receiver 1 can then reliably decode /3i from 
this residue since the rate R\< \ log(l + jf-). 

VII. Conclusion 

The results of fl4)-fT9) showed that the sparse regression 
framework can be used to design rate-optimal codes for 
point-to-point source and channel coding with computationally 
efficient encoders and decoders. In this paper, we showed 
how these source and channel codes can be combined to 
implement random binning and superposition. These two 



techniques have been fundamental ingredients of rate-optimal 
coding schemes for a wide range of problems in network 
information theory. The next goal is a precise performance 
analysis of the computationally feasible versions of the coding 
schemes presented here. Constructing a library of efficient 
sparse regression modules to perform source coding, channel 
coding, binning and superposition will pave the way for 
fast, rate-optimal codes for several network problems such as 
multiple description coding, lossy distributed source coding, 
interference channels and relay channels. 
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